|
|
|
| Home | Wayback Machine | Archive-It | Blog | Heritrix |
| Anonymous User (login or join us) | Upload |
Accelovation Crawl Crawl data from Accelovation. This data is currently not publicly accessible. |
|
Alexa Crawls Crawl data donated by Alexa Internet. This data is currently not publicly accessible. |
|
Archive Team Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake... |
|
Archive-It Digital Collection The Archive-It Digital Collection |
|
archiveteam-mobileme Description Forthcoming |
|
Common Crawl Web crawl data from Common Crawl. |
|
Cuil Crawl Data Crawl data from cuil.com. |
|
Custom Crawl Services National library harvesting. |
|
Focused Crawls Focused crawls are collections of frequently-updated webcrawl data from narrow (as opposed to broad or wide) web crawls, often focused on a single domain or subdomain. |
|
httparchive Successful societies and institutions recognize the need to record their history - this provides a way to review the past, find explanations for current behavior, and spot emerging trends. In... |
|
Institut national de l’audiovisuel Crawl data from Institut national de l’audiovisuel in France. This data is currently not publicly accessible. |
|
Internet Archive Web Crawls Crawl data collected by the Internet Archive. This data is currently not publicly accessible in this format. To view archived web pages, please visit the Wayback Machine. |
|
Internet Memory Foundation Crawl data from Internet Memory Foundation. This data is currently not publicly accessible. |
|
Mercator Crawl Crawl done with the DEC/HP-labs 'Mercator' crawler and converted to ARC format. This data is currently not publicly accessible. |
|
National Library of Australia Crawl National Library of Austrailia crawl. This data is currently not publicly accessible. |
|
Thumper Transfer Web crawl data transferred from thumpers in Santa Clara data center. |
|
urlteam Web Crawls Crawl data collected by the urlteam. |
|
web-group-internal miscellaneous data |
|
Wikileaks.org Archive A collection of web pages from the wikileaks websites as well as news coverage and commentary surrounding the Wikileaks releases. It includes coverage of the Afghan war diaries, the Iraq war logs,... |
|
Wikimedia Downloads All downloads provided by the Wikimedia Foundation are available in this collection. Most of the files here originate from their designated download server. What is available? Wikimedia projects'... |
|
Wikimedia Foundation Media Media files from the Wikimedia Foundation. |
|
Wikipedia Dumps Data dumps of the wikipedia.org web site. |
|
WikiTeam WikiTeam software is a set of tools for archiving wikis. They work on MediaWiki wikis, but we want to expand to other wiki engines. As of April 2012, WikiTeam has preserved more than 500 wikis. ... |











